DotStar: Breaking the Scalability and Performance Barriers in Regular Expression Set Matching

نویسندگان

  • Davide Pasetto
  • Fabrizio Petrini
چکیده

Regular expressions are widely used to parse data and to detect recurrent patterns and information: they are a common choice for defining configurable rules for a variety of systems. In fact, many dataintensive applications rely on regular expression parsing as the first line of defense to perform on-line data filtering. Unfortunately, few solutions can keep up with the increasing data rates and the complexity posed by several hundreds of regular expressions. In this paper we present DotStar (.*), a complete software tool-chain that can compile a set of regular expressions into an automaton that is highly optimized to run on multi-core processors with vector/SIMD extensions. DotStar relies on several algorithmic breakthroughs, to transform the user-provided regular expressions into a sequence of more manageable intermediate representations. The resulting automaton is both space and time efficient, and can perform the search in a single pass, without backtracking. The experimental evaluation, performed on a state-of-the-art Intel quad-core processor, shows that DotStar can efficiently handle both small sets of regular expressions, such as those used in protocol parsing, and much larger sets like the ones designed for Network Intrusion Detection Systems (NIDS). The experimental results show that we can achieve processing rates ranging from 2.2 Gbits/sec with the more demanding sets of NIDS expressions, to 5 Gbits/sec with XML parsing, with a performance speedup of almost two orders of magnitude when compared to popular libraries such as Boost, reaching, and in some case exceeding, the performance of specialized ASICs and FPGAs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Practical regular expression matching free of scalability and performance barriers

http://dx.doi.org/10.1016/j.comcom.2014.08.005 0140-3664/ 2014 Elsevier B.V. All rights reserved. ⇑ Corresponding author at: Department of Automation, Tsinghua University, Beijing, China. E-mail addresses: [email protected] (K. Wang), fu-z13@mails. tsinghua.edu.cn (Z. Fu), [email protected] (X. Hu), [email protected]. cn (J. Li). Kai Wang a,b,⇑, Zhe Fu , Xiaohe Hu , Ju...

متن کامل

Selective Regular Expression Matching

The signature-based intrusion detection is one of the most commonly used techniques implemented in modern intrusion detection systems (IDS). One of the powerful tools that gained wide acceptance in IDS signatures over the past several years is the regular expressions. However, the performance requirements of traditional methods for matching the incoming events against regular expressions are pr...

متن کامل

Software Toolchain for Large-Scale RE-NFA Construction on FPGA

We present a software toolchain for constructing large-scale regular expression matching (REM) on FPGA. The software automates the conversion of regular expressions into compact and high-performance non-deterministic finite automata (RE-NFA) [17]. Assuming a fixed number of fan-out transitions per state, an n-state m-bytes-per-cycle regular expression matching engine (REME) can be constructed i...

متن کامل

Relationship between Coefficients of Characteristic Polynomial and Matching Polynomial of Regular Graphs and its Applications

ABSTRACT. Suppose G is a graph, A(G) its adjacency matrix and f(G, x)=x^n+a_(n-1)x^(n-1)+... is the characteristic polynomial of G. The matching polynomial of G is defined as M(G, x) = x^n-m(G,1)x^(n-2) + ... where m(G,k) is the number of k-matchings in G. In this paper, we determine the relationship between 2k-th coefficient of characteristic polynomial, a_(2k), and k-th coefficient of matchin...

متن کامل

A GPGPU Implementation of Approximate String Matching with Regular Expression Operators and Comparison with Its FPGA Implementation

In this paper, we propose an efficient GPGPU implementation of an algorithm for approximate string matching with regular expression operators, originally implemented on an FPGA, and compare the GPGPU, FPGA and CPU implementations by experiments. Approximate string matching with regular expression operators is used in various applications, such as full text database search and DNA sequence analy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008